首页> 外文OA文献 >Document Clustering by Dynamic Hierarchical Algorithm Based on Fuzzy Set Type-ii From Frequent Itemset
【2h】

Document Clustering by Dynamic Hierarchical Algorithm Based on Fuzzy Set Type-ii From Frequent Itemset

机译:基于频繁集的基于模糊集-ii型的动态层次算法的文档聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

One of ways to facilitate process of information retrieval is by performing clustering toward collection of the existing documents. The existing text documents are often unstructured. The forms are varied and their groupings are ambiguous. This cases cause difficulty on information retrieval process. Moreover, every second new documents emerge and need to be clustered. Generally, static document clustering method performs clustering of document after whole documents are collected. However, performing re-clustering toward whole documents when new document arrives causes inefficient clustering process. In this paper, we proposed a new method for document clustering with dynamic hierarchy algorithm based on fuzzy set type - II from frequent itemset. To achieve the goals, there are three main phases, namely: determination of key-term, the extraction of candidates clusters and cluster hierarchical construction. Based on the experiment, it resulted the value of F-measure 0.40 for Newsgroup, 0.62 for Classic and 0.38 for Reuters. Meanwhile, time of computation when addition of new document is lower than to the previous static method. The result shows that this method is suitable to produce solution of clustering with hierarchy in dynamical environment effectively and efficiently. This method also gives accurate clustering result.
机译:促进信息检索过程的一种方法是对收集的现有文档执行聚类。现有的文本文档通常是非结构化的。形式各不相同,并且它们的分组不明确。这种情况在信息检索过程中造成困难。而且,每隔一秒就会出现新文档,并且需要将它们聚集在一起。通常,静态文档聚类方法是在收集整个文档之后执行文档聚类。但是,当新文档到达时,对整个文档进行重新聚类会导致效率低下的聚类过程。本文提出了一种基于频繁项集的基于模糊集类型-II的动态层次结构文档聚类的新方法。为了实现这些目标,分为三个主要阶段:确定关键术语,提取候选聚类和聚类层次结构。在实验的基础上,得出新闻组F-measure值为0.40,经典组为0.62,路透社为0.38。同时,添加新文档时的计算时间比以前的静态方法要短。结果表明,该方法适用于在动态环境下有效,高效地产生具有层次结构的聚类解决方案。该方法还可以提供准确的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号